Improving Cluster Method Quality by Validity Indices

نویسندگان

  • Narjes Hachani
  • Habib Ounelli
چکیده

Clustering attempts to discover significant groups present in a data set. It is an unsupervised process. It is difficult to define when a clustering result is acceptable. Thus, several clustering validity indices are developed to evaluate the quality of clustering algorithms results. In this paper, we propose to improve the quality of a clustering algorithm called ”CLUSTER” by using a validity index. CLUSTER is an automatic clustering technique. It is able to identify situations where data do not have any natural clusters. However, CLUSTER has some drawbacks. In several cases, CLUSTER generates small and not well-separated clusters. The extension of CLUSTER with validity indices overcomes these drawbacks. We propose four extensions of CLUSTER with four validity indices Dunn, DunnRNG, DB, and DB∗. These extensions provide an adequate number of clusters. The experimental results on real data show that these algorithms improve the clustering quality of CLUSTER. In particular, the new clustering algorithm based on DB∗ index is more effective than other algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing the Questionnaire of Teachers’ Work Life Quality

Background and Objectives: Quality of work life is one of the most important factors in promotion of teachers and having them continue their jobs. This study aimed at designing and evaluating a questionnaire for teachers’ work life quality.  Methods: In this research, a sequential exploratory approach (instrument editing model) was used and in the qualitative stage, a semi-structured interview...

متن کامل

Improving the Efficiency of Clustering by Using an Enhanced Clustering Methodology

Clustering in data analysis means data with similar features are grouped together within a particular valid cluster. Each cluster consists of data that are more similar among themselves and dissimilar to data of other clusters. Clustering can be viewed as an unsupervised learning concept from machine learning perspective. In this paper, we have proposed an Enhanced Clustering Methodology to obt...

متن کامل

Intra-cluster Similarity Index Based on Fuzzy Rough Sets for Fuzzy C-Means Algorithm

Cluster validity indices have been used to evaluate the quality of fuzzy partitions. In this paper, we propose a new index, which uses concepts of Fuzzy Rough sets to evaluate the average intra-cluster similarity of fuzzy clusters produced by the fuzzy c-means algorithm. Experimental results show that contrasted with several well-known cluster validity indices, the proposed index can yield more...

متن کامل

انتخاب اعضای ترکیب در خوشه‌بندی ترکیبی با استفاده از رأی‌گیری

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...

متن کامل

A Bounded Index for Cluster Validity

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007